Skip to content

Conversation

@DavisVaughan
Copy link
Member

@DavisVaughan DavisVaughan commented Oct 24, 2025

Part of #1933

In #696 we introduced some admittedly hacky machinery that mimicked R's internal ALTREP methods table to get access to an ALTREP class's Extract_subset method so we could call it directly from vec_slice(), avoiding ALTREP materialization.

That used STDVEC_DATAPTR(), a non-API function. DATAPTR() is also non-API, though DATAPTR_RO() seems to be allowed. In theory maybe we could just swap for DATAPTR_RO(), but I wanted to take a look at this because this feels like fairly brittle code that R Core probably would not like us doing.

Rather than trying to call the ALTREP Extract_subset method directly, I took a look at the code paths in R itself that end up calling this. Both [ and .subset() get you there. I'd like to avoid calling [ to avoid any S3 method funny business, as I don't want to end up in an endless loop calling into vctrs::vec_slice(), but .subset() seems very useful for this.

This PR switches us to always go through .subset() for ALTREP objects in vec_slice().

A notable difference is that previously if there was no Extract_subset method, then we could fall back to vctrs slicing code. Now it falls back to base R slicing code, which uses the Elt method repeatedly. I do not think there is any safe/approved way to keep the previous behavior.


Here is the example from #696 (comment) reproduced with this PR

library(vroom)
x <- vroom(vroom_example("mtcars.csv"), altrep = TRUE)[1:3]
#> Rows: 32 Columns: 12
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: ","
#> chr  (1): model
#> dbl (11): mpg, cyl, disp, hp, drat, wt, qsec, vs, am, gear, carb
#> 
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
vroom_str(x)
#> 'spec_tbl_df', 'tbl_df', 'tbl', and 'data.frame': 32 obs., 3 vars.:
#> $model:  altrep:true type:vroom::vroom_chr   length:32   materialized:false
#> $mpg:    altrep:true type:vroom::vroom_dbl   length:32   materialized:false
#> $cyl:    altrep:true type:vroom::vroom_dbl   length:32   materialized:false

# calling directly
vroom_str(vctrs::vec_slice(x[[1]], 1:5))
#> altrep:true  type:vroom::vroom_chr   length:5    materialized:false

# implicitly by dev tibble
vroom_str(x[x$cyl == 4, ])
#> 'spec_tbl_df', 'tbl_df', 'tbl', and 'data.frame': 11 obs., 3 vars.:
#> $model:  altrep:true type:vroom::vroom_chr   length:11   materialized:false
#> $mpg:    altrep:true type:vroom::vroom_dbl   length:11   materialized:false
#> $cyl:    altrep:false    type: double    length:11

# implicitly by dev dplyr
vroom_str(dplyr::filter(x, cyl == 4))
#> 'spec_tbl_df', 'tbl_df', 'tbl', and 'data.frame': 11 obs., 3 vars.:
#> $model:  altrep:true type:vroom::vroom_chr   length:11   materialized:false
#> $mpg:    altrep:true type:vroom::vroom_dbl   length:11   materialized:false
#> $cyl:    altrep:false    type: double    length:11

# `cyl` materialized by the `==` above, but everything else is still good!
vroom_str(x)
#> 'spec_tbl_df', 'tbl_df', 'tbl', and 'data.frame': 32 obs., 3 vars.:
#> $model:  altrep:true type:vroom::vroom_chr   length:32   materialized:false
#> $mpg:    altrep:true type:vroom::vroom_dbl   length:32   materialized:false
#> $cyl:    altrep:true type:vroom::vroom_dbl   length:32   materialized:true

Created on 2025-10-24 with reprex v2.1.1

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can delete all of this hackery now!

Comment on lines -60 to +62
r_obj* alt_subscript = KEEP(vec_subscript_materialize(subscript)); \
r_obj* out = ALTVEC_EXTRACT_SUBSET_PROXY(x, alt_subscript, r_null); \
FREE(1); \
if (out != NULL) { \
return out; \
} \
} \
if (is_compact_rep(subscript)) { \
return vec_slice_altrep(x, subscript); \
} else if (is_compact_rep(subscript)) { \
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note can no longer fall through to vctrs slicing code on ALTREP objects

Comment on lines 145 to 149
#define SLICE_BARRIER(RTYPE, CONST_DEREF, SET, NA_VALUE) \
if (is_compact_rep(subscript)) { \
if (!materialize && ALTREP(x)) { \
return vec_slice_altrep(x, subscript); \
} else if (is_compact_rep(subscript)) { \
SLICE_BARRIER_COMPACT_REP(RTYPE, CONST_DEREF, SET, NA_VALUE); \
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Notably we weren't actually doing anything with ALTREP in the barrier path for characters and lists

But all our tests are for a character ALTREP vector, so they weren't actually hitting this at all and it was materializing the test vector.

I've added extra checks to our tests that confirm that after the vec_slice(x) call that x is still not materialized, which previously would have failed.

Base R also now has support for ALTREP lists thanks to Gabor, so those would now be supported by this as well.

Comment on lines 448 to +449
test_that("vec_slice() works with Altrep classes with custom extract methods", {
x <- .Call(vctrs_altrep_rle_Make, c(foo = 10L, bar = 5L))
x <- chr_rle(foo = 10L, bar = 5L)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note this only tests the barrier altrep path (character / list). We don't have an altrep test class for the non-barrier path (like an altrep integer rep or something).

We could make one if we feel like we need the coverage

@DavisVaughan DavisVaughan requested a review from lionel- October 24, 2025 20:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant